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Abstract 

Scientific journals are the repositories of mankinds gradually ac¬ 
cumulating knowledge of the surrounding world. Just as knowledge 
is organised into classes ranging from major disciplines, subjects and 
fields to increasingly specific topics, journals can also be categorised 
into groups using various metrics. In addition, they can be ranked ac¬ 
cording to their overall influence. However, according to recent studies, 
the impact, prestige and novelty of journals cannot be characterised by 
a single parameter such as, e.g., the impact factor. In order to deepen 
our insight into the impact of a journal, the knowledge gap our work 
is intended to fill is the evaluation of journal relevance using complex 
multi-dimensional measures. Thus, for the first time, our objective is 
to organise journals into multiple hierarchies based on citation data. 
The two approaches we use are designed to address this problem from 
different perspectives. We use a measure related to the notion of m- 
reaching centrality and find a network that shows a journals level of 
influence in terms of the direction and efficiency with which informa¬ 
tion spreads through the network. We find we can also obtain an 
alternative network using a suitably modified nested hierarchy extrac¬ 
tion method applied to the same data. In this case, in a self-organized 
way, the journals become branches according to the major scientific 
fields, where the local structure of the branches is reflecting the hier¬ 
archy within the given field, with usually the most prominent journal 
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(according to other measures) in the field chosen by the algorithm as 
the local root, and more specialised journals positioned deeper in the 
branch. This can make the navigation within different scientific fields 
and sub-fields very simple, being equivalent to navigating in the dif¬ 
ferent branches of the nested hierarchy and, for example, should be 
very helpful when choosing the most appropriate journal for a given 
manuscript. According to our results, the two alternative hierarchies 
show a somewhat different but also consistent picture of the intricate 
relations between scientific journals, and, as such, they also display a 
new angle of view on how our scientific knowledge is organized into 
networks. 


Introduction 


Providing an objective ranking of scientific journals and mapping them into 
different knowledge domains are very complex problems of significant impor¬ 
tance, which can be achieved using a number of different approaches. Prob¬ 


ably the most widely known quality measure is the impact factor (Garfield 


1955, 1999), corresponding to the total number of citations a journal receives 
in a 2-year period, divided by the number of published papers over the same 
period. Although it is a rather intuitive quantity, the impact factor is also 


suffering from serious limitations (Opthof, 1997; Seglen, 1997; Harter and 


Nisonger, 1997; Bordons et ah, 2002). This consequently led to the introduc¬ 


tion of alternative measures such as the H-index for journals (Braun et al. 


PageRank and the Y-factor (Bollen et al. 


2006), the g-index (Egghe, 2006), the Eigenfactor (Bergstrom, 2007), the 


(The Scimago Journal & Country Rank), and the use of various central! 


2007), the Scimago Journal Rank 


ties such as the degree-, closeness- or betweeness centrality in the citation 
network between the journals (Bollen et al., 2005; Leydesdorff, 2007). Com¬ 


paring the advantages and disadvantages of the different impact measures 
and examining their correlation has attracted considerable interest in the 
literature (Bollen et al., 2009; Franceschet, 20K»ab 


Glanzel, 2011; Kaur 


et al., 2013). However, according to the results of the principal component 


analysis of 39 quality measures carried out by Bollen et al. (2009), scientific 


impact is a multi-dimensional construct that cannot be adequately mea¬ 
sured by any single indicator. Thus, the development of higher dimensional 
quality indicators for scientific journals provides an important objective for 
current research. 

In this study we consider different possibilities for defining a hierarchy 
between scientific journals based on their citation network. The advantage of 
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using the network approach for representing the intricate relations between 
journals is that networks can show substantially different aspects compared 
to any parametric method representing the journals with points in single- 
or even in multi-dimensional space. When organised into a hierarchy, the 
most important and prestigious journals are expected to appear at the top, 
while lesser known journals are expected to be ranked lower. However, a 
hierarchy offers a more complex view of the ranking between journals com¬ 
pared to a one dimensional impact measure. For example, if the branches of 
the hierarchy are organised according to the different scientific fields, then 
the journals in a given field can be compared simply by zooming into the 
corresponding branch in the hierarchy. Possible scenarios for hierarchical 
relations between scientific journals have already been suggested by Iyengar 
and Balijepally recently (Iyengar and Balijepally, 2015|). However, the main 


objective in this earlier study (Iyengar and Balijepally, |2015 ) was to examine 
the validity of a linear ordering between the journals based on a dominance 
ranking procedure. Here we construct and visualise multiple hierarchies be¬ 
tween the journals, offering a far more complex view of the ranking between 
journals compared to a one dimensional impact measure. 

Hierarchical organisation in general is a widespread phenomenon in na¬ 
ture and society. This is supported by several studies, focusing on the 
transcriptional regulatory network of Escherichia coli (Ma et ah, 2004), the 


dominant-subordinate hierarchy among crayfish (Goessmann et ah, 2000), 


the leader-follower network of pigeon flocks (Nagy et ah, 2010), the rhesus 


macaque kingdoms (Fushing et ah, 2011), neural networks (Kaiser et al. 


2010), technological networks (Pumain, 2006), social interactions (Guimera 


et al. 


2003 


^Krugman, 


Ulanowicz 


Pol 


1996 


1985 


ner et al., 2006; Valverde and Sole, 2007), urban planning 


Batty and Longley, 1994), ecological systems (Hirata and 


Wickens and Ulanowicz, 1988), and evolution (Eldredge 


1985; McShea, 2001). However, hierarchy is a polysemous word, and in gen¬ 


eral, we can distinguish between three different types of hierarchies when 
describing a complex system: the order, the nested and the flow hierarchy. 
In the case of order hierarchy, we basically define a ranking, or more precisely 
a partial ordering, of the set of elements under investigation (Lane, 2006). 
Nested hierarchy, (also called as inclusion hierarchy or containment hierar¬ 
chy), represents the idea of recursively aggregating the items into larger and 
larger groups, resulting in a structure where higher level groups consist of 
smaller and more specific components (Wimberley, 2009). Finally, a flow 


hierarchy can be depicted as a directed graph, where the nodes are layered 
in different levels so that the nodes that are influenced by a given node (are 
connected to it through a directed link) are at lower levels. 
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Hierarchical organisation is an important concept also in network theory 


(Ravasz et ah, 

2002 

Trusina et al. 

2004 

; Clauset et al., 

2008 

Pumain, 

2006 

Corominas-Murtra et ah, 201l[ Mones et ah, 2012; Corominas-Murtra et al. 


2013). The network approach has become an ubiquitous tool for analysing 
complex systems - from the interactions within cells, transportation sys¬ 
tems, the Internet and other technological networks, through to economic 


networks, collaboration networks and society (Albert and Barabasi, 2002 


Mendes and Dorogovtsev, 2003). Grasping the signs of hierarchy in networks 


is a non-trivial task with a number of possible different approaches, includ¬ 
ing the statistical inference of an underlying hierarchy based on the observed 


network structure (Clauset et ah, 2008), and the introduction of various hier 
archy measures (Trusina et ah, 2004| Mones et ah, 2012; Corominas-Murtra 


et ah, 2013). What makes the analysis of hierarchy even more complex is 


that it may also be context dependent. According to a recent study on hom¬ 
ing pigeons, the hierarchical pattern of in-flight leadership does not build 
upon the stable, hierarchical social dominance structure (pecking order) ev¬ 


ident among the same birds (Nagy et ah, 2013). 


In this study we show that in a somewhat similar fashion, scientific 
journals can also be organised into multiple hierarchies with different types. 
Our studies rely on the citation network between scientific papers obtained 
from the Web of Science (ISI Web of Knowledge). On the one hand, the 


flow hierarchy analysis of this network based on the m-reaching centrality 


(Mones et ah, 2012; Borgatti, 2003) reveals the structure relevant from the 


point of view of knowledge spreading and influence. On the other hand, the 
alternative hierarchy obtained from the same network with the help of an 


automated tag hierarchy extraction method (Tibely et ah, 2013) highlights 


a nested structure with the most interdisciplinary journals at the top and 
the very specialised journals at the bottom of the hierarchy. 


Scientific publication data 

The data set on which our studies rely consists of all the available publi¬ 
cations in the Web of Science (ISI Web of Knowledge) between 1975 and 
2011. The downloading scripts we used are available in (WOS publication 


data downloading scripts). In order to take into account a list of papers as 


wide as possible, we did not apply any specific filtering. Thus, conference 
proceedings and technical papers also appear in the used data set. How¬ 
ever, since the network we study builds upon citation between papers (or 
journals), the conference proceedings, technical papers (or even journals) 
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with no incoming citation fall out of the flow hierarchy analysis automati¬ 
cally. (Nevertheless, in the event that they have outgoing citations, this is 
included in the evaluation of the m-reaching centrality of other journals). 
Furthermore, even when cited, a conference proceedings does not have a real 
chance of getting high in any of the hierarchies considered here, due to their 
very limited number of publications compared to journals. Although highly 
cited individual conference proceedings publications may appear, they can¬ 
not boost the overall citation of the proceedings to the level of journals, 
(e.g., whenever a scientific breakthrough is published in a conference pro¬ 
ceedings first, it is usually also published in a more prestigious journal soon 
afterwards, which eventually drives the citations to the journal instead of 
the proceedings). For these reasons, the conference proceedings are ranked 
at the bottom of the hierarchies we obtained. 

We used the 11 character-long abbreviated journal issue field in the core 
data for identifying the journal of a given publication. The advantage of 
using this field is that it contains only an abbreviated journal name without 
any volume numbers, issue numbers, years, and so on, (in contrast, the 
full journal name in some cases may contain the volume number or the 
publication year as well, which of course, are varying over time). The total 
number of publications for which the mentioned data field was non-empty 
reached 35,372,038, and the number of different journals identified based on 
this data field was 13,202. As mentioned previously, in case of conference 
proceedings, the appearing 11 character long abbreviated journals issue field 
was treated the same as in case of journal publications, without any filtering. 


Flow hierarchy based on the m-reaching centrality 


A recently introduced approach for quantifying the position of a node in a 


flow hierarchy is based on the m-reaching centrality (Mones et ah, 2012) 


The basic intuition behind this idea is that reaching the rest of the network 
should be relatively easy for the nodes high in the hierarchy, and more dif¬ 
ficult from the nodes at the bottom of the hierarchy. Thus, the position 
of the node i in the hierarchy is determined by its m-reaching centrality 


(Borgatti, 2003), C m (i), corresponding to the fraction of nodes that can be 
reached from i, following directed paths of at most m steps, (where m is a 
system dependent parameter). Naturally, a higher C m (i) value corresponds 
to a higher position in the hierarchy, and the node with the maximal C m {i) 
is chosen as the root. However, this approach does not specify the ancestors 
or descendants of a given node in the hierarchy, instead it provides only a 
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ranking between the nodes of the underlying network according to C m {i). 
Nevertheless, hierarchical levels can be defined in a simple way: after sorting 
nodes in an ascending order, we can sample and aggregate nodes into levels 
so that in each level, the standard deviation of (7 m is lower than a pre-defined 
fraction of the standard deviation in the whole network. This method of con¬ 
structing a flow hierarchy based on the m-reach (and the standard variation 
of the m-reach) has already been shown to provide meaningful structures for 
a couple of real systems including electric circuits, transcriptional regulatory 
networks, e-mail networks and food webs (Mones et ah , 2012). 

When applying this approach to the study of the hierarchy between 
scientific journals, we have to take into account that journals are not directly 
connected to each other; instead they are linked via a citation network 
between the individual publications. In principle we may assume different 
‘journal strategies’ for obtaining a large reach in this system: for example, 
a journal might publish a very high number of papers of poor quality with 
only a few citations each. Nevertheless, taken together they can still provide 
a large number of aggregated citations. Another option is to publish a lower 
number of high-quality papers, obtaining a lot of citations individually. To 
avoid having a built-in preference for one type of journal over the other, we 
define a reaching centrality that is not sensitive to such details, and which 
only depends on the number of papers that can be reached in m-steps from 
publications appearing in a given journal. 

First, we note that when calculating the reach of the publications, the 
citation links have to be followed backwards: that is, if paper i is citing 
j, then the information presented in j has reached i. Thus, the reaching 
centralities are evaluated in a network where the links are pointing from a 
reference article to all papers citing it. The m-reach of a journal J, denoted 
by C m (J), is naturally given by the number of papers that can be reached 
in at most m steps from any article appearing in the given journal. Thus, 
the mathematical definition of C m (J ) is based on the set of m-reachable 
nodes, given by 


i | ^out G h 0 — (1) 

where d out (j,i) denotes the out-distance from paper j to z, (i.e., the dis¬ 
tance of the papers when only consecutive out-links are considered). The set 
Cm{J) is equivalent to the set of papers outside J that can be reached in at 
most m steps, provided that the starting publication is in J . The m-reaching 
centrality of J is simply the size of the m-reachable set, C m (J ) = \C m (J)\, 
(i.e., the number of papers in C m (c7)). Fig{T[ shows an illustration of the 
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Figure 1: A schematic illustration of the calculation of journals’ m-reach, 
a) The papers are represented by grey nodes, connected by directed citation 
links, while the journals are corresponding to the coloured sets, b) When 
calculating the m-reach, the links have to be reverted. For example, in the 
case of a 3-reach, articles in journal B (orange) can reach 4 papers in total 
(the ones in journals A, C and D) excluding journal B itself; while journal 
H (dark blue) has a reach of 6, (given by articles in journals F and G). Note 
that if we would switch to network between journals, B would have a reach 
of 5, (as it can reach journals A, C, D, E and F); while journal H would 
have a reach of 2, (since it could reach only journals F and G). 


calculation of the m-reach of the journals detailed above. We note that a 
closely related impact measure for judging the influence of research papers 
based on deeper layers of other papers in the citation network is given by 
the wake-citation-score (Klosik and Bornholdt, 2014). A comparison study 
between the m-reach and the wake-citation-score is given in the Supplemen¬ 
tary Information SI. 

In order to determine the optimal value of m, we calculated the C m (J) 
for all journals in our data set for a wide range of m values. According to 
the results detailed in the Supplementary Information S2., around m = 4 
the C m (J) starts to saturate for the top journals. In order to provide a 
fair and robust ordering between the journals, here we set m to m = 3, 
corresponding to an optimal setting: on the one hand we are still allowing 
multiple steps in the paths contributing to the reach. On the other hand, 
we also avoid the saturation effect caused by the exponential increase in 
the reach as a function of the maximal path length and the finite system 
size. (More details on the tuning of m are given in the Supplementary 
Information S2, and the results obtained for other m values are shown in 
the Supplementary Information S3). 

Before considering the results, we note that an alternative approach for 
studying the citation between journals is to aggregate all papers in a given 


7 






journal into a single node, representing the journal itself, in similar fashion 
to the works by Leydesdorff et al. (2013, 2014). In this case the link weight 
from journal J to journal X is given by total number of citations from papers 
appearing in J to papers in X. In the Supplementary Information S4. we 
analyse the flow hierarchy obtained by evaluating the m-reaching centrality 
in this aggregated network between the journals. However, recent works 
have pointed out that aggregations of this nature can lead to serious mis¬ 


judgement of the importance of nodes (Rosvall et al., 2013; Pfitzner et al 


2013). For instance, an interesting memory effect of the citation network be¬ 


tween individual papers is that a paper citing mostly biological papers that 
appear in an interdisciplinary journal, is still much more likely to be cited 


back by other biological papers compared to other disciplines (Rosvall et al. 


2013). Such phenomena can have a significant influence on the m-reaching 


centrality. However, by switching to the aggregated network between jour¬ 
nals we wipe out these effects and introduce a distortion in the m-reach. 
Thus, here we stick to the most detailed representation of the system, given 
by the citation network between individual papers, and leave the analysis of 
the aggregated network between journals to the Supplementary Information 
S4. (An illustration of the difference between the m-reach calculated on the 
level of papers and on the aggregated level of journals is given in Fig{l}) 

The results for the top journals according to the m-reaching centrality 
at m = 3 based on the publication data available from the Web of Science 
between 1975 and 2011 are given in Figj2j The hierarchy levels were defined 
by allowing a maximal standard deviation of 0.13 • (j{C m ) for C m within 
a given level, where denotes the standard deviation of C m over all 

journals. (The effect of changes in the within-level standard deviation of C m 
on the shape of the hierarchy is discussed in the Supplementary Information 
S5). According to our analysis, Science is the most influential journal based 
on the flow hierarchy, followed by Nature, with PNAS coming third, while 
Lancet and the New England Journal of Medicine forme the fourth level. In 
general, the top of the hierarchy is strongly dominated by medical, biological 
and biochemical journals. For instance, the top physics journal, the Physical 
Review Letters appears only on the 13 th level, and the top chemistry journal, 
the Journal of the American Chemical Society is positioned at the 11 th level. 

For comparison, in Fig.S7. in the Supplementary Information S4. we 
show the top of the flow hierarchy obtained from the citation network aggre¬ 
gated to the level of journals. Although Science, Nature and PNAS preserve 
their position as the top three journals, relevant changes can be observed in 
the hierarchy levels just below, as physical and chemical journals take over 
the biological and medical journals. For instance, Physical Review Letters 
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Figure 2: Top journals in the flow hierarchy according to the m-reaching 
centrality C m at mn = 3, based on the scientific publication data from the 
Web of Science. The standard deviation of C m within the individual hi¬ 
erarchy levels is at most 0.13-cr(C m ), where <j(C m ) denotes the standard 
deviation of C m over all journals. The nodes are coloured according to the 
scientific field of the given journal. 


is raised from level 13 to level three, while Lancet is pushed back from level 
four to level 17. This reorganisation is likely to be caused by the ‘memory’ 
of the citation network described in the work by Rosvall et al. (2013), - the 
fact that a paper citing mostly biological articles is more likely to be cited 
by other biological papers, even if it appears in an interdisciplinary journal. 
Since biology and medicine have the highest publication rate among differ¬ 
ent scientific fields, the aggregation to the level of journals has the most 
severe effect on the reach of entities obtaining citations mostly from these 
fields. Thus, the notable difference between the flow hierarchy obtained from 
the citation network of individual papers and from the aggregated network 
between journals is yet another indication of the distortion in centralities 
caused by link aggregation, pointed out in related, but somewhat different 
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contexts by Rosvall et al. (2013) and by Pfitzner et al. (2013). 


Extracting a nested hierarchy 


Categorising items into a nested hierarchy is a general idea that has been 
around for a long time in, for instance, library classification systems, biolog¬ 
ical classification and also in the content classification of scientific publica¬ 
tions. A very closely related problem is that given by the automatised cat- 


egorisation of free tags appearing in various on-line content ( 

Heymann and 

Garcia-Molina, 2006; Plangprasopchok et al. 

, 2011; Schmitz, 

2006; Damme 


et al. 


2007 Tibely et al. 


2013 


Velardi et al., 2013). In the recent years the 


voluntary tagging of photos, films, books, and so on, with free words has 
become popular on the internet in blogs, various file sharing platforms on¬ 
line stores and news portals. In some cases these phenomena are referred to 


as collaborative tagging (Lambiotte and Ausloos[ 2006; 

Cattuto et al. 

2007; 

Floeck et al., 2011 Cattuto et al. 

2009 

), and the resultant large collections 


of tags are referred to as folksonomies, highlighting their collaborative ori 
gin and the “flat” organisation of the tags in these systems (Lambiotte and 
Ansloos , 2006[ Cattnto et al. , 2007[ |2009[ |Mika[ |2005[ |Spyns et al. , 2006 


Voss, 2007 Tibely et al., 2012). The natural mathematical representation 


of tagging systems is given by hypergraphs (Ghosal et al., 2009 Zlatic et al 


2009). 


Revealing the hidden hierarchy between tags in a folksonomy or a tagging 
system in general can significantly help broadening or narrowing the scope of 
search in the system, give recommendation about yet unvisited objects to the 


user, or help the categorisation of newly appearing objects (Lu et al., 2012 


Juszczyszyn et al., 2010). Here we apply a generalised version of a recent tag 


hierarchy extraction method (Tibely et al., 2013) for constructing a nested 


hierarchy between scientific journals. In its original form, the input of the 
tag hierarchy extraction algorithm is given by the weighted co-occurrence 
network between the tags, where the weights correspond to number of shared 
objects. Based on the z-score of the connected pairs and the centrality of 
the tags in the co-occurrence network, the hierarchy is built bottom up, 
as the algorithm eventually assigns one or a few direct ancestors to each 
tag (except for the root of the hierarchy). The details of the algorithm are 
described in the Nested hierarchy exctraction algorithm subsection. 

In order to study the nested hierarchy between scientific journals, we sim¬ 
ply replace the weighted co-occurrence network between tags by the weighted 
citation network between journals at the input of the algorithm. Although 
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a tag co-occurrence network and a journal citation network are different, 
the two most important properties needed for the nested hierarchy analy¬ 
sis are the same in both: general tags and multidisciplinary journals have 
a significantly larger number of neighbours compared to more specific tags 
and specialised journals. Furthermore, closely related tags co-appear more 
often compared to unrelated tags, as journals focusing on the same field 
cite each other more often compared to journals dealing with independent 
disciplines. Based on this, the hierarchy obtained from the journal citation 
network in this approach is expected to be organised according to the scope 
of the journals, with the most general multidisciplinary journals at the top 
and the very specialised journals at the bottom. 

We note that since in this case we have to determine which journals 
are the most closely related to each other and which are unrelated, rather 
than evaluating the overall influence of the journals, we use simply the num¬ 
ber of direct citations from one journal to the other as the weight for the 
connections. This is equivalent of taking the m-reach calculated on the pub¬ 
lication level at m = 1, sorting according to the source of the citations and 
then summing up the results for the papers appearing in one given journal. 
Thus, when constructing the flow hierarchy, we start from the publication 
level citation network and evaluate the m-reach at m = 3, whereas in case 
of the nested hierarchy we calculate the publication level m-reach at m = 1, 
which technically becomes equivalent to the journal level citation numbers 
when summed over papers appearing in one given journal. 


Nested hierarchy extraction algorithm 


Our algorithm corresponds to a generalised version of “Algorithm B” pre¬ 
sented in (Tibely et ah, 2013). The main differences are that here we force 


the algorithm to produce a directed acyclic graph consisting of a single con¬ 
nected component, and we allow the presence of multiple direct ancestors. 
In contrast, in its original form “Algorithm B” can provide disconnected 
components, and each component in the output is corresponding to a di¬ 
rected tree. A further technical improvement we introduce is given by the 
calculation of the node centralities. Thus, the outline of the method used 
here is the following: first we carry out “Algorithm B” given in (Tibely et al. 


2013) with modified centrality evaluation, obtaining a directed tree between 
the journals. This is followed by a second iteration where we ‘enrich’ the 
hierarchy by occasionally assigning further direct ancestors to the nodes. 

Since ‘Algorithm B” is presented in full detail in (Tibely et al., 2013), 
here we provide only a brief overview. The input of the algorithm is a 
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weighted directed network between the journals based on the z-score for 
the citation links. After throwing away unimportant connections by using 
a weight threshold, the node centralities are evaluated in the remaining 
network. Here we used a centrality based on random walks on the citation 
network between journals with occasional teleportation steps, in a similar 
fashion to PageRank. We adopted the method proposed by |Lamb i otte and 


Rosvall (2012j), calculating the dominant right eigenvector of the matrix 


Mij = (1 — a)wij + <asj n , where Wij is the link weight, (z-score), s denotes 
the in strength of journal i (in number of citations), and a is corresponding 
to the teleportation probability. We have chosen the widely used a = 0.15 
parameter value, however, the ordering of the journals according to the 
centralities was quite robust with respect to changes in a. 

Based on the centralities a directed tree representing the backbone of the 
hierarchy is built from bottom up as described in “Algorithm B” in (Tibely 
et ah, 2013). In the event that we cannot find a suitable “parent” for node i 


according to the original rules, we chose the node with the highest accumu¬ 
lated z-score from all journals that have a higher centrality than i, (where 
the accumulation is running over the already found descendants of the given 
node). This ensures the emergence of a single connected component, since 
a single direct ancestor is assigned to every node (except for the root of the 
tree). This is followed by a final iteration over the nodes where we examine 
whether further ‘parents’ have to be assigned or not. The criteria for accept¬ 
ing a node as the second, third, and so on, direct ancestor of journal i are 
that it must have a higher centrality compared to i, and also the z-score has 
to be larger than the z-score between i and its first direct ancestor. Note 
that the first parent is chosen based on aggregated z-score instead of the 
simple pairwise z-score, as explained in the work by Tibely et al. (2013). 


Nested hierarchy of scientific journals 

In Figj3} we show the top of the obtained nested hierarchy between the 
journals, with Nature appearing as the root, while PNAS, Science, New Sci¬ 
entist and The Astrophysical Journal form the second level. Several promi¬ 
nent field specific journals such as Physical Review Letters, Brain Research, 
Ecology and Journal of the American Chemical Society have both Nature 
and Science as direct ancestors. Interestingly, The Astrophysical Journal 
is a direct descendant only of Nature, and is not linked under Science, or 
PNAS. Nevertheless, it serves as a local root for a branch of astronomy re¬ 
lated journals, in a similar fashion to Physical Review Letters, which can be 
regarded as the local root of physics journals, or Journal of the American 
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Figure 3: The top of the nested hierarchy between scientific journals. Due to 
the rapidly increasing number of nodes per level, the journals on levels four 
and five are organised into multiple rows. The size of the nodes indicates 
the total number of descendants (on a logarithmic scale). The journals 
positioned above level five with no out links shown, (e.g., Europhysics Letters 
or Bioscience), have descendants on levels that are out of the scope of the 
figure. 


Chemical Society, corresponding to the local root of chemical journals. The 
biological, medical and biochemical journals form a rather mingled branch 
under PNAS, with Journal of Biological Chemistry as the local root and 
New England Journal of Medicine corresponding to a sub-root for medical 
journals. However, Cell and New England Journal of Medicine are direct 
descendants of Nature and Science as well. Interestingly, the brain- and neu¬ 
roscience related journals form a rather well separated branch with Brain 
Research as the local root, linked directly under PNAS, Science and also 
under Nature. 


Comparing the hierarchies 

Although the hierarchies presented in Figj2j and Figj3j show a great deal of 
similarity, some interesting differences can also be observed. The figures are 
showing the top of the corresponding hierarchies, and seemingly, a significant 
portion of the journals ranked high in the hierarchy are the same in both 
cases. However, the root of the hierarchies is different (Science in case of 
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the flow hierarchy and Nature in case of the nested hierarchy), and also, 
the level-by-level comparison of Figj2j and Figj3| shows that a very high 
position in the flow hierarchy is not always accompanied by an outstanding 
position in the nested hierarchy, and vice versa. For example, Lancet and 
New England Journal of Medicine appear much higher in Fig j2] compared to 
Figj3j while Geophysical Research Letters is just below Nature and Science 
in the nested hierarchy and is not even shown in the top of the flow hierarchy. 


Number of levels 

1 23 4 8 16 24 27 



Number of journals 

Figure 4: Comparison between the flow hierarchy and the nested hierarchy 
of scientific journals. The Jaccard similarity coefficient Jf(£f) of the aggre¬ 
gated sets of journals is plotted as a function of the number of accumulated 
journals from the top of the hierarchy to level £j in the flow hierarchy. Cir¬ 
cles are corresponding to the similarity between the two hierarchies, while 
squares show the similarity between two random sets of journals of the cor¬ 
responding size. 

To make the comparison between the two types of hierarchies more quan¬ 
titative, we subsequently aggregated the levels in the hierarchies starting 
from the top, and calculated the Jaccard similarity coefficient between the 
resulting sets as a function of the level depth £. Thus, when £ = 1, we are 
actually comparing the roots, when £ = 2, the journals on the top two lev¬ 
els, and so on. However, since the total number of levels in the hierarchies 
are different, we refine the definition of the similarity coefficient by allowing 
different £ values in the two hierarchies, and always choosing the pairs of 
aggregated sets with the maximal relative overlap. Therefore, we actually 
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have two similarity functions, 
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Jn{£- n 


\Sf(£f) n S n (£ n )\ 
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a ^\S f (£ f )US n (£ n )y 
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where Sf(£f ) and S n (£ n ) denote the set of aggregated journals from the 
root to level if in the flow hierarchy and to level £ n in the nested hierarchy, 
respectively. When evaluating Jf (if ) at a given level depth if according to 
Q, the set of aggregated journals in the flow hierarchy, Sf(£f) is fixed, and 
we search for the most similar set of aggregated journals from the nested 
hierarchy by scanning over the entire range of possible £ n values, and choose 
the one giving the maximal Jaccard similarity. Similarly, when calculating 
Jn(£n) according to Q, the set of aggregated journals taken from the nested 
hierarchy, S n (£ n ) is fixed, and the set Sf(£f) yielding the maximal Jaccard 
similarity is chosen from the flow hierarchy. 

In Figj4j we show the result obtained for Jf(£f) as a function of the level 
depth if in the flow hierarchy, (while the corresponding J n (£n) plot for the 
nested hierarchy is given in Fig.SlO. in the Supplementary Information S6.). 
Beside Jf(£f), in Figj4j we also plotted the expected similarity between the 
aggregated sets of journals and a random set of journals of the same size. 
Since the roots of the hierarchies are different, the curves are starting from 
zero at if — 1, and naturally, as we reach to the maximal level depth, the 
similarity is approaching to one, since all journals are included in the final 
aggregate. However, at the top levels below the root, a prominent increase 
can be observed in the Jf(£f), while the similarity between random sets of 
journals is increasing only very slowly in this region. Thus, the flow hier¬ 
archy and the nested hierarchy revealed by our methods show a significant 
similarity also from the quantitative point of view. This is also supported by 
the remarkably small r = 0.16 generalised Kendall-tau distance obtained by 
treating the two hierarchies as partial orders, and applying a natural exten¬ 
sion of the standard distance measure between total orders. The definition 
of the distance measure and the details of the calculation are given in the 
Supplementary Information S7. 

Finally, our hierarchies can also be compared to traditional impact mea¬ 
sures. According to the results detailed in the Supplementary Information 
S8, both the flow- and the nested hierarchy show moderate correlations with 
the impact factor, the Scimago Journal Rank and the closeness centrality 
of the journals in the aggregated citation network. Therefore, the general 
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trends shown by the hierarchies are consistent with previously introduced, 
widely used impact measures. However, when looking into the details, they 
also provide an alternative point of view with important differences, circum¬ 
venting large correlation values with the former, one dimensional character¬ 
isations of journal ranking. 


Discussion 


Ranking and comparing the importance, prestige and popularity of scien¬ 
tific journals is a far from trivial task with quite a few different available 


impact measures (Garfield 

1955, 

1999 

; Braun et ah, 2006 

Egghe, 

2006 

Bergstrom, 2007[ Bollen et ah, 20071 

2005 

Leydesdorff| 20071 

Bollen et al. 

2009 

Franceschet, 

2010a|b 

; Glanzel, 

2011 

Kaur et ah, 2013J) 

. However, it 


seems that the overall impact of journals cannot be adequately characterised 
by a single one dimensional quality measure (Bollen et ah, 2009). In this 


light, our results offer an informative overview on the ranking and the intri¬ 
cate relations between journals, where instead of e.g., simply ordering them 
according to a one dimensional parameter, we organise them into multiple 
hierarchies. 

First, we defined a flow hierarchy between the journals based on the 
m-reaching centrality in the citation network between the scientific papers. 
This structure organises the journals according to their potential for spread¬ 
ing new scientific ideas, with the most influential information spreaders 
sorted at the top of the hierarchy. In this respect Science turned out to 
be the root, followed by Nature and PNAS, and the top dozen levels of the 
hierarchy were dominated by multidisciplinary, biological, biochemical and 
medical journals. 

We also constructed a nested hierarchy between the journals by general¬ 
ising a recent tag hierarchy extraction algorithm. In this case the journals 
were organised into branches according to the major scientific fields, with a 
clear separation between unrelated fields, and relatively strong mixing and 
overlap between closely related fields. Mapping the different journals into 


well oriented knowledge domains is a complex problem on its own (Chen 


et al., 

2001ajb; 

Borner 

2010), 


2008 


2010), especially from the point of view of multi- and interdisci¬ 


plinary fields. Our nested hierarchy provides a natural tool for the visual¬ 
isation of the intricate nested and overlapping relations between scientific 
fields as well. An important feature is that the organisation of the branches 
roughly highlights the local hierarchy of the given field, with usually the 
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most prominent journal in the field serving as the local root, and more spe¬ 
cialised journals positioned at the bottom. Thus, zooming into a specific 
field for comparing and ranking the journals that publish in the given field 
becomes simple: we just have to select the corresponding branch in the 
nested hierarchy. 

Another interesting perspective is that based on the position of a journal 
in the nested hierarchy we gain immediate information on its standing within 
its particular field. According to that we can select those journals with 
which we can make a fair comparison, and we can exclude journals in far 
away branches from any comparing study. Moreover, similarly to judging 
the position of a journal within its specific field (a local branch), we can 
also judge the standing of this sub-field in a larger scientific domain, (a 
main branch), and so on, and thereby, compare the ranking of the different 
scientific fields and sub-fields (each being composed of multiple journals). 
When zooming out completely to the overall hierarchy between the journals, 
Nature was observed to be in the very top position with Science, PNAS, The 
Astrophysical Journal and New Scientist formed the second level, and the 
field dependent branches starting at the third level. 

The comparison between the two types of hierarchy reveals a strong simi¬ 
larity accompanied by significant differences. Basically, Science, Nature and 
PNAS provide the top three journals in both cases, and also, the top few 
hundred nodes in the hierarchy have a far larger overlap than expected at 
random. However, a closer level-by-level inspection showed that a very high 
position in, for example, the flow hierarchy does not guarantee a similarly 
outstanding ranking in the nested hierarchy, and vice versa. Both hier¬ 
archies showed moderate correlations with the impact factor, the Scimago 
Journal Rank and closeness centrality in the citation network. This supports 
our view that the hierarchical organisation of scientific journals provides an 
interesting alternative for the description of journal impact, which is con¬ 
sistent with the previously introduced measures at large, but in the mean 
time it shows important differences when examined in details. 

In summary, the two hierarchies we constructed offer a compound view 
of the inter relations between scientific journals, and provide a higher di¬ 
mensional characterisation of journal impact instead of ranking simply ac¬ 
cording to a one-dimensional parameter. Naturally, hierarchies between sci¬ 


entific journals can be defined in other ways too (Iyengar and Balijepally 


2015). For example, when building a flow hierarchy, the overall influence 


of journals could be measured alternatively with other quantities such as 
the wake-citation score (Klosik and Bornholdt, 2014), the PageRank or the 


Y-factor (Bollen et al. 2007). In parallel, a nested hierarchy might also be 
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constructed by suitably modifying a community finding algorithm producing 


inherently nested and overlapping communities such as the Infomap (Ros 


vail and Bergstrom, 

2008t 

Rosvall et ah, 2013; Rosvall and Bergstrom 

,2011) 


or the clique percolation method (Palla et ah, 2005). Another interesting 


aspect we have not taken into account here is given by the time evolution 
of the citation network between the journals. Obviously, the ranking of the 
journals changes with time, and by treating all publications between 1975 
and 2011 in a uniform framework we neglected this effect. However, the 
examination of the further possibilities for hierarchy construction and the 
study of the time evolution of the journal hierarchies is out of the scope 
of the present work, although it provides interesting directions for future 
research. 


Additional information 

Supplementary information accompanies this paper. 
The authors declare no competing financial interest. 


Data availability 

The datasets analysed during the current study are available in the Web of 

Science repository, owned by Thomson Reuters, http: //scientific. thomson. com/isi/ 

but restrictions apply to the availability of these data, which were used under 

license from Thomson Reuters, and so are not publicly available. Data are 

however available from the authors upon reasonable request and permission 

of Thomson Reuters. 

The downloading scripts used in the study are available in the Dataverse 
repository: http: //dx. doi . org/10.7910/DVN/MCXTHF 


Acknowledgements 

The research was partially supported by the European Union and the Euro¬ 
pean Social Fund through project FuturICT.hu (grant no: TAMOP-4.2.2.C- 
ll/l/KONV-2012-0013), by the Hungarian National Science Fund (OTKA 
K105447) and by the EU FP7 ERC COLLMOT project (grant no: 227878). 
The funders had no role in study design, data collection and analysis, deci¬ 
sion to publish or preparation of the manuscript. 


18 






















References 

R. Albert and A.-L. Barabasi. Statistical mechanics of complex networks. 
Rev. Mod. Phys ., 74:47-97, 2002. 

M. Batty and P. Longley. Fractal Cities: A Geometry of Form and Function. 
Academic, San Diego, 1994. 

C. T. Bergstrom. Eigenfactor: Measuring the value and prestige of scholarly 
journals. C& RL News , 68:314-316, 2007. 

J. Bollen, H. V. de Sompel, J. Smith, and R. Luce. Toward alternative 
metrics of journal impact: a comparison of download and citation data. 
Inform. Process. Manag ., 41:1419-1440, 2005. 

J. Bollen, M. A. Rodriguez, and H. V. de Sompel. Journal status. Sciento- 
metrics , 69:669-687, 2007. 

J. Bollen, H. V. de Sompel, A. Hagberg, and R. Chute. A principal compo¬ 
nent analysis of 39 scientific impact measures. PLoS ONE , 4:e6022, 2009. 
doi: 10.1371/journal.pone.0006022. 

M. Bordons, M. T. Fernandez, and I. Gomez. Advantages and limitations 
in the use of impact factor measures for the assessment of research per¬ 
formance. Scientometrics , 53:195-206, 2002. 

S. P. Borgatti. The key player problem. In Dynamic Social Network Mod¬ 
elling Analysis: Workshop Summary and Papers , pages 241-252. National 
Academy of Sciences Press, 2003. 

K. Borner. Atlas of Science: Visualizing What We Know. The MIT Press, 

2010 . 

T. Braun, W. Glaanzel, and A. Schubert. A hirsch-type index for journals. 
Scientometrics , 69:169-173, 2006. 

C. Cattuto, V. Loreto, and L. Pietronero. Semiotic dynamics and collabo¬ 
rative tagging. Proc. Natl. Acad. Sci. USA , 104:1461-1464, 2007. 

C. Cattuto, A. Barrat, A. Baldassarri, G. Schehr, and V. Loreto. Collective 
dynamics of social annotation. Proc. Natl. Acad. Sci. USA , 106:10511- 
10515, 2009. 

C. Chen, J. Kuljis, and R. J. Paul. Visualizing latent domain knowledge. 
IEEE T. Syst. Man Cy. C., 31:518-529, 2001a. 


19 



C. Chen, R. J. Paul, and B. OKeefe. Fitting the jigsaw of citations: In¬ 
formation visualization in domain analysis. J. Am. Soc. Inform. Sci ., 52: 
315-330, 2001b. 

A. Clauset, C. Moore, and M. E. J. Newman. Hierarchical structure and 
the prediction of missing links in networks. Nature , 453:98-101, 2008. 

B. Corominas-Murtra, C. Rodriguez-Caso, J. Gohi, and R. Sole. Measuring 
the hierarchy of feedforward networks. Chaos , 21:016108, 2011. 

B. Corominas-Murtra, J. Gohi, R. V. Sole, and C. Rodriguez-Caso. On the 
origins of hierarchy in complex networks. Proc. Natl. Acad. Sci. USA , 
110:13316-13321, 2013. 

C. V. Damme, M. Hepp, and K. Siorpaes. Folksontology: An integrated 
approach for turning folksonomies into ontologies. Soc. Networks , 2:57- 
70, 2007. 

L. Egghe. Theory and practice of the g-index. Scientometrics , 69:131-152, 
2006. 

N. Eldredge. Unfinished Synthesis: Biological Hierarchies and Modern Evo¬ 
lutionary Thought. Oxford Univ. Press, New York, 1985. 

F. Floeck, J. Putzke, S. Steinfels, K. Fischbach, and D. Schoder. Imitation 
and quality of tags in social bookmarking systems - collective intelligence 
leading to folksonomies. In T. J. Bastiaens, U. Baumol, and B. J. Kramer, 
editors, On Collective Intelligence , volume 76 of Advances in Intelligent 
and Soft Computing , pages 75-91. Springer Berlin Heidelberg, 2011. 

M. Franceschet. The difference between popularity and prestige in the sci¬ 
ences and in the social sciences: A bibliometric analysis. J. Informetr ., 4: 
55-63, 2010a. 

M. Franceschet. Ten good reasons to use the eigenfactori metrics. Inform. 
Process. Manag ., 46:555-558, 2010b. 

H. Fushing, M. P. McAssey, B. Beisner, and B. McCowan. Ranking network 
of captive rhesus macaque society: A sophisticated corporative kingdom. 
PLoS ONE , 6:el7817, 2011. 

E. Garfield. Citation indexes for science: A new dimension in documentation 
through association of ideas. Science , 122:108, 1955. 


20 



E. Garfield. Journal impact factor: a brief review. Can. Med. Assoc. J., 
161:979-980, 1999. 

G. Ghosal, V. Zlatic, G. Caldarelli, and M. E. J. Newman. Random hyper- 
graphs and their applications. Phys. Rev. E , 79:066118, 2009. 

W. Glanzel. The application of characteristic scores and scales to the eval¬ 
uation and ranking of scientific journals. J. Inf. Sci ., 37:40-48, 2011. 

C. Goessmann, C. Hemelrijk, and R. Huber. The formation and maintenance 
of crayfish hierarchies: behavioral and self-structuring properties. Behav. 
Ecol. Sociobiol. , 48:418-428, 2000. 

R. Guimera, L. Danon, A. Dfaz-Guilera, F. Giralt, and A. Arenas. Self¬ 
similar community structure in a network of human interactions. Phys. 
Rev. E, 68:065103, 2003. 

S. P. Harter and T. E. Nisonger. Isis impact factor as misnomer: a proposed 
new measure to assess journal impact. J. Am. Soc. Inform. Sci., 48: 
1146-1148, 1997. 

P. Heymann and H. Garcia-Molina. Collaborative creation of communal 
hierarchical taxonomies in social tagging systems. Technical report, Stan¬ 
ford InfoLab, 2006. 

H. Hirata and R. Ulanowicz. Information theoretical analysis of the aggre¬ 
gation and hierarchical structure of ecological networks. J. Theor. Biol., 
116:321-341, 1985. 

ISI Web of Knowledge, 2012. http://scientific.thomson.com/isi/ 
(Date of access: 01/01/2012). 

K. Iyengar and V. Balijepally. Ranking journals using the dominance hi¬ 
erarchy procedure: an illustration with is journals. Scientometrics, 102: 
5-23, 2015. 

K. Juszczyszyn, P. Kazienko, and M. Katarzyna. Personalized ontology- 
based recommender systems for multimedia objects. In A. Hakansson, 
R. Hartung, and N. Nguyen, editors, Agent and Multi-agent Technology 
for Internet and Enterprise Systems, volume 289 of Studies in Computa¬ 
tional Intelligence, pages 275-292. Springer Berlin Heidelberg, 2010. 

M. Kaiser, C. C. Hilgetag, and R. Kotter. Hierarchy and dynamics of neural 
networks. Front. Neuroinform., 4:112, 2010. 


21 


J. Kaur, F. Radicchi, and F. Menczer. Universality of scholarly impact 
metrics. J. Informetr., 7:924-932, 2013. 

D. F. Klosik and S. Bornholdt. The citation wake of publications detects 
nobel laureates’ papers. PLoS ONE, 9:el 13184, 2014. 

P. R. Krugman. Confronting the mystery of urban hierarchy. J. Jpn. Int. 
Econ. , 10:399-418, 1996. 

R. Lambiotte and M. Ausloos. Collaborative tagging as a tripartite network. 
Led. Notes in Computer Sci ., 3993:1114-1117, 2006. 

R. Lambiotte and M. Rosvall. Ranking and clustering of nodes in networks 
with smart teleportation. Phys. Rev. E , 85:056107, 2012. 

D. Lane. Hierarchy, complexity, society. Springer, Dodrecht, the Nether¬ 
lands, 2006. 

L. Leydesdorff. Betweenness centrality as an indicator of the interdisci¬ 
plinarity of scientific journals. J. Am. Soc. Inform. Sci., 58:1303-1319, 
2007. 

L. Leydesdorff, F. de Moya-Anegon, and V. P. Guerrero-Bote. Journal maps, 
interactive overlays, and the measurement of interdisciplinarity on the ba¬ 
sis of scopus data. arXiv:1310.4966 [cs.DL] (Date of access: 31/10/2014.), 
2013. 

L. Leydesdorff, F. de Moya-Anegon, and W. de Nooy. Aggregated 
journal-journal citation relations in scopus and web-of-science matched 
and compared in terms of networks, maps, and interactive overlays. 
arXiv:1404.2505 [cs.DL] (Date of access: 31/10/2014.), 2014. 

L. Lu, M. Medo, C. H. Yeung, Y.-C. Zhang, Z.-K. Zhang, and T. Zhou. 
Recommender systems. Phys. Rep., 519:1-49, 2012. 

H. W. Ma, J. Buer, and A. P. Zeng. Hierarchical structure and modules in 
the escherichia coli transcriptional regulatory network revealed by a new 
top-down approach. BMC Bioinformatics, 5:199, 2004. 

D. W. McShea. The hierarchical structure of organisms. Paleobiology, 27: 
405-423, 2001. 

J. F. F. Mendes and S. N. Dorogovtsev. Evolution of Networks: From 
Biological Nets to the Internet and WWW. Oxford Univ. Press, Oxford, 
2003. 


22 


P. Mika. Ontologies are us: A unified model of social networks and semantics. 
In In International Semantic Web Conference, volume 3729, pages 522- 
536, 2005. 

E. Mones, L. Vicsek, and T. Vicsek. Hierarchy measure for complex net¬ 
works. PLoS ONE , 7:e33799, 2012. 

M. Nagy, Z. Akos, D. Biro, and T. Vicsek. Hierarchical group dynamics in 
pigeon flocks. Nature , 464:890-893, 2010. 

M. Nagy, G. Vasarhelyi, B. Pettit, I. Roberts-Mariani, T. Vicsek, and 
D. Biro. Context-dependent hierarchies in pigeons. Proc. Natl. Acad. 
Sci. USA , 110:13049-13054, 2013. 

T. Opthof. Sense and nonsense about the impact factor. Cardiovasc. Res., 
33:1-7, 1997. 

G. Palla, I. Derenyi, I. Farkas, and T. Vicsek. Uncovering the overlapping 
community structure of complex networks in nature and society. Nature, 
435:814-818, 2005. 

R. Pfitzner, I. Scholtes, A. Garas, C. J. Tessone, and F. Schweitzer. Between¬ 
ness preference: Quantifying correlations in the topological dynamics of 
temporal networks. Phys. Rev. Lett., 110:198701, 2013. 

A. Plangprasopchok, K. Lerman, and L. Getoor. A probabilistic approach 
for learning folksonomies from structured data. In Fourth ACM Inter¬ 
national Conference on Web Search and Data Mining (WSDM), pages 
555-564, 2011. 

P. Pollner, G. Palla, and T. Vicsek. Preferential attachment of communities: 
The same principle, but a higher level. Europhys. Lett., 73:478-484, 2006. 

D. Pumain. Hierarchy in Natural and Social Sciences, volume 3 of Methodos 
Series. Springer Netherlands, Dodrecht, The Netherlands, 2006. 

E. Ravasz, A. L. Somera, D. A. Mongru, Z. N. Oltvai, and A.-L. Barabasi. 
Hierarchical organization of modularity in metabolic networks. Science, 
297:1551 - 1555, 2002. 

M. Rosvall and C. Bergstrom. Multilevel compression of random walks 
on networks reveals hierarchical organization in large integrated systems. 
PLoS ONE, 6:el8209, 2011. 


23 



M. Rosvall and C. T. Bergstrom. Maps of random walks on complex net¬ 
works reveal community structure. Proc. Natl. Acad. Sci. USA , 105:1118— 
1123, 2008. 

M. Rosvall, A. V. Esquivel, A. Lancichinetti, J. D. West, and R. Lam- 
biotte. Memory in network flows and its effects on community detection, 
ranking, and spreading. arXiv: 1305.4807 [physics.soc-ph]. (Date of access: 
31/10/2014.), 2013. 

P. Schmitz. Inducing ontology from flickr tags. In Proc. of Collaborative Web 
Tagging Workshop at the 15th Int. Conf. on World Wide Web (WWW), 
2006. 

P. O. Seglen. Why the impact factor of journals should not be used for 
evaluating research. Brit. Med. J., 314:498-502, 1997. 

R. M. Shiffrin and K. Borner. Mapping knowledge domains. Proc. Natl. 
Acad. Sci. USA , 101:5183-5185, 2004. 

P. Spyns, A. D. Moor, J. Vandenbussche, and R. Meersman. From Folk- 
sologies to Ontologies: How the Twain Meet. In In Proceedings of OTM 
Conferences , volume 1, pages 738-755, 2006. 

The Scimago Journal & Country Rank, 2015. http://www.scimagojr.com 
(Date of access: 16/03/2015). 

G. Tibely, P. Pollner, T. Vicsek, and G. Palla. Ontologies and tag-statistics. 
New J. Phys ., 14:053009, 2012. 

G. Tibely, P. Pollner, T. Vicsek, and G. Palla. Extracting tag hierarchies. 
PLoS ONE , 8:e84133, 2013. 

A. Trusina, S. Maslov, P. Minnhagen, and K. Sneppen. Hierarchy measures 
in complex networks. Phys. Rev. Lett., 92:178702, 2004. 

S. Valverde and R. V. Sole. Self-organization versus hierarchy in open-source 
social networks. Phys. Rev. E , 76:046118, 2007. 

P. Velardi, S. Faralli, and R. Navigli. Ontolearn reloaded: A graph-based 
algorithm for taxonomy induction. Comput. Linguist ., 39:665-707, 2013. 

J. Voss. Tagging, folksonomy & Co - renaissance of manual indexing? 
arXiv:cs/0701072v2 [cs.IR] (Date of access: 31/10/2014.), 2007. 


24 


J. Wickens and R. Ulanowicz. On quantifying hierarchical connections in 
ecology. J. Soc. Biol. Struct ., 11:369-378, 1988. 

E. T. Wimberley. Nested ecology. The place of humans in the ecological 
hierarchy. John Hopkins University Press, Baltimore, MD, 2009. 

WOS publication data downloading scripts, 2012. 

http://hiertags.elte.hu/downloads/datasets/wos/ (Date of 
access: 01/01/2012). 

V. Zlatic, G. Ghosal, and G. Caldarelli. Hypergraph topological quantities 
for tagged social networks. Phys. Rev. E, 80:036118, 2009. 


25 


Supplementary Information 


SI The m-reach and the wake-citation-score 


The flow hierarchy we study is based on the m-reach defined for the journals 
in Eq.(l) in the main paper. However, there are also further other alter¬ 
natives for measuring the impact of the journals or individual papers by 
taking into account deeper layers of other papers in the citation network. A 


prominent example is given by the wake-citation-score, introduced by Klosik 


and Bornholdt (2014). The basic idea here is to calculate a weighted sum 


over the publications in the in-component of a given paper i up to a certain 
maximal distance d max , where the weight of the papers is decreasing accord¬ 
ing to a d , where a E [0,1] is a constant, and d denotes the distance from 
i in the citation network. In the work by Klosik and Bornholdt (2014) the 


impact of the publications appearing in a given year are compared, thus, 
the wake-citation-score is normalised by the maximal score obtained in the 
given year. 

Here we compare the m-reach with the wake-citation-score in case of 
scientific journals appearing in our database. However, in our case the nor¬ 
malisation by the largest wake-citation-score per year cannot be applied, as 
we are interested in the overall citation of the journals in the entire available 
time period. Therefore, the wake-citation-score was adapted to journals as 
follows. In order to have notation consistent with the main paper, we con¬ 
sider here a network where the links are pointing from a reference article to 
all papers citing it. First we carry out a weighted summation as 


w{j) = £ « a 

< 2=1 


|{* I d out (j,i) = d,j <E J M £ J }\, 


(SI) 


where W(J) is the (non-normalised) wake-citation-score of the journal J 
and d out (j, i) denotes the out-distance from paper j to i. When the quantity 
above has been evaluated for all journals in the data set, we simply normalise 
the results by the largest W(J) obtained. 

In FigjST] we show the wake-citation-score of journals as a function of 
their m-reach. In order to ensure an m-reach value falling in the unit inter¬ 
val, the m-reach results were normalised by the largest m-reach obtained 
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from the data set, (in a similar fashion to the wake-citation-score). In 
FigjSJjt we display the results obtained when d max = m and a = 0.5, while 
FiglSlfc is showing the results for d max = 9 and a = 0.5. The obtained scat- 

rn = d max , oi — 0.5 d max = 9, ex = 0.5 



Reach 


Reach 


Figure SI: a) The wake-citation-score of journals at a — 0.5 as a function 
of their m-reach, (where each data point is representing a different journal). 
The different colours are coding the results obtained for different m values. 
The dmax in the calculation of the wake-citation-score was set to d max = m. 
b) The wake-citation-score of journals at a — 0.5 and d max = 9 as a function 
of their m-reach. 


ter plots suggest very strong correlations, especially in case of FigjSl^. This 
is fully supported by the corresponding correlation values listed in Table [Si] 
The dependence of the results on the parameter a is examined in Fig|S2| 
Similarly to the previous case, a quite strong correlation can be observed 
between the wake-citation-score and the m-reach for the majority of the a 


values. The corresponding correlation values are given in Table S2 


S2 Setting the parameter m in the flow hierarchy 
analysis 

The maximal allowed path length in the calculation of the m-reaching cen¬ 
trality, denoted by m, is an important parameter of our approach for the 
analysis of the flow hierarchy between scientific journals. Since the struc¬ 
ture of the citation network is very far from a crystal lattice or a regular 
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m — 3, d max — 9 


m= 4, d max = 9 



Reach Reach 

Figure S2: a) The wake-citation-score of journals at d max = 9 as a function 
of their m-reach at m = 3, (where each data point is representing a different 
journal). The different colours are coding the results obtained for different 
a values, b) The same plot as in case of a) when m is set to m — 4. 
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^Pearson 

^Spearman 

Ckendall 

1 

1 

1.000 

1.000 

1.000 

1 

9 

0.332 

0.884 

0.712 

2 

2 

0.999 

1.000 

0.994 

2 

9 

0.487 

0.932 

0.786 

3 

3 

0.999 

1.000 

0.993 

3 

9 

0.657 

0.961 

0.842 

4 

4 

0.999 

1.000 

0.993 

4 

9 

0.788 

0.978 

0.884 

5 

5 

0.998 

1.000 

0.993 

5 

9 

0.883 

0.988 

0.916 

6 

6 

0.997 

1.000 

0.991 

6 

9 

0.943 

0.994 

0.940 

7 

7 

0.996 

1.000 

0.989 

7 

9 

0.974 

0.997 

0.959 

8 

8 

0.995 

1.000 

0.985 

8 

9 

0.988 

0.999 

0.973 

9 

9 

0.993 

0.999 

0.982 

9 

9 

0.993 

0.999 

0.982 


Table SI: The correlation between the m-reach and the wake-citation score 
for the data shown in FigJSl^t (left) and for the data shown in FigJSlfc (right). 
(The a parameter in the calculation of the wake-citation-score was set to 
a = 1/2.) The first two columns are listing m and d m ax, the 3 rd column 
provides the Pearson correlation, the 4 th column gives the Spearman’s rank 
correlation coefficient, while the 5 th column is containing the Kendall rank 
correlation coefficient. 


tree, and it also has a relatively large link density, the small world effect is 
expected to take place: the average distance is low between pairs of papers 
that can be reached from one to the other following the citations. Thus, the 


28 















m 

4max 

a 

Fpearson 

^Spearman 

^Kendall 

3 

9 

0.10 

0.719 

0.966 

0.856 

3 

9 

0.25 

0.696 

0.964 

0.851 

3 

9 

0.50 

0.657 

0.961 

0.842 

3 

9 

0.75 

0.619 

0.957 

0.832 

3 

9 

0.90 

0.596 

0.953 

0.823 

m 

4max 

a 

^Pearson 

^Spearman 

Ckendall 

4 

9 

0.10 

0.840 

0.982 

0.899 

4 

9 

0.25 

0.821 

0.980 

0.894 

4 

9 

0.50 

0.788 

0.978 

0.884 

4 

9 

0.75 

0.754 

0.974 

0.873 

4 

9 

0.90 

0.734 

0.971 

0.864 


Table S2: The correlation between the m-reach and the wake-citation score 
for the data shown in Fig]S2^t (top) and for the data shown in FigJS2f> 
(bottom). The first two columns are giving m and r/ max . and the 3 rd column 
is showing the a parameter. The corresponding Pearson correlation is given 
in the 4 th column, followed by the Spearman’s rank correlation coefficient 
in the 5 th column, and the Kendall rank correlation coefficient in the 6 th 
column. 


number of reachable articles from a given paper or a given journal saturates 
rather fast as a function of m. This effect is shown in FigJS3}, where we 


plot the C m (J ), corresponding to the size of the m-reachable set of papers 
(defined in Eq.(l) in the main paper), divided by the size of the reachable 
set of papers at unlimited path length m oo, as a function of m for the 
top 10 journals. According to the curves, the reach of Science and Nature 
saturates already at m = 4, the reach of PNAS around mn — 5, while for the 
rest of the journals in the figure, the saturation occurs at higher m values. 

The two main reasons for the saturation effect are the exponential in¬ 
crease of the number reachable papers as a function of m, and the finite 
system size. Since the saturation effect takes place at different m values for 
the different journals, in order to provide a fair comparison between their 
influence based on the “information spreading ability”, we should take an m 
value below the saturation of all journals, that is an m value below m < 4. 
When m > 4, the already saturated journals have a disadvantage, as their 
m-reach is already starting to be affected by the finite system size, while the 
not yet saturated journals do not suffer from this problem. 

Keeping m smaller than m — 4 is also consistent with the general in¬ 
tuition about the spread of information on the citation network: a direct 
citation is usually corresponding to a strong interrelation between the two 
papers, which are likely to be focused on the same field. However, as we 
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Figure S3: The size of the m-reachable set, denoted by C m (J r ), divided by 
the size of the reachable set of papers at unlimited path length, as a function 
of m for the top 10 journals according to the flow hierarchy. 


increase the distance between the papers in the citation network, the relat¬ 
edness between them usually drops, e.g., a pair of papers 4 citation steps 
away from each other can very easily belong to absolutely different fields. 

Based on the above, we have chosen to set m to m = 3 in the flow 
hierarchy analysis outlined in the main paper. According to Fig|S3|, on the 
one hand this way we avoid the saturation effect present at m > 4 values. On 
the other hand, we also allow multiple steps in the information spread over 
the system, with a limited path length where we can still assume at least 
a weak relatedness between the papers at the opposite end of the citation 
path. A further advantage of this choice is that variance of the C m {J) values 
is significantly larger at m — 3 compared to e.g., m — 5, thus, providing 
a ranking between the journals based on C m (J ) is much more robust at 
m — 3. 

S3 Flow hierarchy at different m values 

In order to fully complete the analysis of the effect of the choice of the 
parameter m on the flow hierarchy, in this Section we show results obtained 
when m is set to lower values compared to the optimal m — 3 case. In 
Fig|S4| we present the top of the hierarchy at m = 2. Apparently, the very 
peak of the hierarchy is looking very similar to the m — 3 case, given in 
Fig.2. in the main paper. I.e., Science is on the top, followed by Nature, 
with PNAS coming 3 rd , while the New England Journal of Medicine and 
Lancet are just below the three major interdisciplinary journals. However, 
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as we go deeper down the hierarchy levels, the difference between the two 
hierarchies becomes visible. E.g., the top chemical and physical journals 
gain a relatively higher position in Fig |S4| compared to Fig.2. in the main 
paper. 



Figure S4: The top of the flow hierarchy when m is set to m — 2, and 
the standard variation of C m withing a level is at most 0.13 • cr(C m ), where 
j(C m ) is corresponding to the standard variation of C m over all journals. 

When switching to m — 1, the discrepancies become much stronger, as 
shown in FigjS5| In this case the top position is shared by Nature and 
Science, while PNAS is placed on the 2 nd level, followed by the Journal of 
Biological Chemistry on the 3 rd level. Although the New England Journal of 
Medicine and Lancet are still just below the peak on the 4 th and 5 th levels, 
respectively, the Journal of the American Chemical Society and Physical 
Review Letters are overcoming the rest of the biological, biochemical and 
medical journals. In parallel, the fraction of physical and chemical journals is 
significantly higher in the overall picture of the top the hierarchy compared 
to the m — 3 case shown in Fig.2. in the main paper. 
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Figure S5: The top of the flow hierarchy when m is set to m — 1, and 
the standard variation of C m withing a level is at most 0.13 • cr(C m ), where 
criCm) is corresponding to the standard variation of C m over all journals. 


We can also calculate the correlation between C m at the optimal m — 3 
setting, and the C m obtained for lower m values. The Pearson correla¬ 
tion coefficient between the results obtained at m = 3 and m — 2 is 
Cpearson = 0.922, while the Spearman’s rank correlation for the same data 
is Cspearman = 0.991, indicating a quite high similarity between the two 
rankings. However, when lowering m to m = 1, the corresponding correla¬ 
tion coefficients are decreasing to Cp earscm = 0.724 and Csp earman — 0.955. 
Based on the above, the structure of the flow hierarchy is moderately robust 
against changes in the parameter m. I.e., if decreasing the length of the 
maximally allowed citation chains by one, the overall picture of the top of 
the hierarchy remains the same, with some differences becoming apparent in 
a level by level comparison. However, if we apply a more drastic change in 
m, the differences are becoming stronger, affecting also the very top ranks 
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of the hierarchy. 


S4 Aggregated citation network 


An alternative option for analysing the hierarchical relations of scientific 
journals based on publication data is to first construct a citation network 
between journals instead of individual papers, and in the next step apply 
the hierarchy related methods on the level of this aggregated network. The 
weights of the directed links between the journals in this framework are cor¬ 
responding to the accumulated number of papers appearing in the “target” 
journal citing at least one paper appearing in the “source” journal. The 
advantage of this approach is that journals are represented by single nodes 
in the obtained network instead of groups of nodes as in case of the citation 
network between papers. However, a considerable drawback is that scientific 


citation networks have a memory (Rosvall et ah, 2013): e.g., a paper citing 


mostly biological articles is likely to be cited mainly by biological papers 
as well. When calculating e.g., the reaching centrality of a journal in the 
aggregated network we neglect this memory effect, and thus, the result can 
show large deviations compared to the value obtained in the original cita¬ 
tion network between papers. This effect is also very closely related to the 
distortions that can be caused by time aggregation in temporal networks, as 
pointed out by Pfitzner et al. (2013). Nevertheless, it is still worth analysing 


the hierarchical properties of the aggregated citation network between jour¬ 
nals for comparison with the results shown in Fig.2. in the main paper, with 
bearing in mind that the reaching centralities obtained here are somewhat 
distorted. 

In order to concentrate only on the highly significant connections be¬ 
tween the journals, we applied a weight threshold, w*, taking into account 
only the links with a weight w > w*. The weights of the links are distributed 
according to a power-law, inferring no plausible threshold by simply study¬ 
ing their distribution. Therefore, the final threshold was chosen so that the 
extent of hierarchy in the resulting network be maximal. A natural measure 
for the hierarchy is given by the Global Reaching Centrality (GRC) (M< 


tones 


et al., 2012), reflecting the inhomogeneity of the reach of the individual 


nodes. The mathematical definition of the GRC is given by 


GRCm = 


1 


N ■ 


- [max {C m (i)} - C m {i )], 


(S2) 


where max 


{<4 m) (i)} is the largest centrality in the network and N is the 
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number of nodes. 

Based on the above, we rejected links with a weight lower than w* = 
K w (w) where (w) denotes the average weight, allowing for different values of 


K u 

according to (S2). In Fig S6 we show the obtained GRC^ 
a function of K w , with a clear global maximum at K w — 10. 
applied this value in the investigation of the flow hierarchy at the level of 
the aggregated network between journals. 


Afterwards, the centralisation of the m-reach centrality was calculated 

for m — 3 as 
Thus, we 



10° 10 1 io 2 io 3 

Edge weight threshold 


Figure S6: The m-reaching centrality in networks obtained by different edge- 
weight cutoffs. After the filtering of the edges, only those with a weight larger 
than K w (w) are kept together with the corresponding nodes. The inset 
shows the number of nodes and edges as a function of the cutoff threshold. 

In Fig|S7| we show the top journals according to the reaching centrality 
within m — 3 steps. Similarly to Fig.2. in the main paper, the hierarchy 
levels are obtained by aggregating the journals into subsets with a standard 
deviation of C m smaller than 0.13-cr(C m ), where <j(C m ) denotes the standard 
deviation of C m over all journals. According to FigjSTj, Science is the most 
influential journal according to the flow hierarchy, followed by Nature, with 
PNAS and Physical Review Letters are forming the 3 d level. Interestingly, 
physical journals dominate the next few levels, with Physical Review A on 
the 4 th level, Journal of Applied Physics on the 5 th level and Physical Review 
B and Physical Review E providing the 6 th level, followed by Applied Physics 
Letters on the 7 th level. 

This tendency is rather different from the results obtained from the ci¬ 
tation network between individual papers (Fig.2. in the main paper), where 
medical, biological and biochemical journals occupied the top of the hierar¬ 
chy. A plausible explanation is that when collapsing all the papers appear- 
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Figure ST: Top of the flow hierarchy according to the m-reaching centrality 
C m at m = 3, based on the aggregated network between the journals. The 
standard deviation of C m within the individual hierarchy levels is at most 
0.13 • a {Cm) , where <j(C m ) denotes the standard deviation of C m over all 
journals. 


ing in a given journal into a single node, we loose the information about the 
number of publications appearing in the journal. Since medical, biological 
and biochemical papers tend to cite mainly within these three fields, the 
reach of related journals is strongly reduced when switching from the net¬ 
work on the level of publications to the network between journals: The very 
high publication rate of these journals provides a high reach in the original 
network between papers, while the collapse of the vast number of papers 
appearing in these journals onto a single node in the aggregated network 
cancels out this effect. In contrast, papers appearing in physical journals 
have a somewhat larger likelihood for citing publications from other fields, 
thus, the aggregation of the papers into nodes representing journals does 
not have such a drastic effect on the reach. 
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S5 Changing the width of the levels in the flow 
hierarchy 


The levels in the flow hierarchy are defined based on the standard variation 
of the m-reach, i.e., cr(C m ) within a level cannot exceed a fixed fraction of 
cj(C m ) calculated over the entire set of journals. For simplicity, let us denote 
this parameter by a;, thus, the maximal variance allowed for journals falling 
in the same level in the hierarchy is given by c ocr{C m ), where co E [0,1]. 
Naturally, the larger uo we choose, the more journals we find in a hierarchy 
level on average, thus, the choice of this parameter has an effect on the 
overall shape of the hierarchy we obtain. 

In the main paper we have shown the results for uo — 0.13. Here we 
provide visualisations of the top of the flow hierarchy at different uo values 
as well. In FigJS8| we present the results for c c = 0.08, while in Fig|S9| we 
display the flow hierarchy at uo = 0.2. 
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Figure S8: The top of the flow hierarchy at m = 3 when the standard 
variation of C m within a level is at most 0.08 • cr(C m ), where a (Cm) denotes 
the standard variation of C m over all journals. 

Although the width of the levels is sensitive to the choice of uo in general, 
the very top of the hierarchy is very robust. I.e., the first 6 levels are exactly 
the same in Figs |S8]|S9| and Fig.2. in the main paper. When we go deeper 
in the hierarchy, naturally, the actual choice of uo is starting to make a 
difference, resulting in a narrow, steep overall shape at uo = 0.08, and a 
more wide, gradual overall shape at uo = 0.2. 

A very useful property of journal hierarchies is that they provide an 
instant and simple visualisation of the journal rankings. However, when uo 
is low, the widening of the levels is slow as we go deeper in the hierarchy 
from the root, and therefore, the total number of journals that can be fitted 
in a picture of the top of the hierarchy is relatively low: When reaching the 
maximum level depth allowed by the height of the picture, (and the condition 
that the journal names should be still readable), the bottom levels are still 
rather narrow, as in case of [S8| 

In contrast, when uo is large, the levels become wide fast as a function 
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Figure S9: The top of the flow hierarchy at m — 3 when the standard 
variation of C m within a level is at most 0.2 • <r((7 m ), where <j(C m ) denotes 
the standard variation of C m over all journals. Due to the rapidly increasing 
number of nodes per level, the journals on level eight and below are organised 
into double rows. 

of the level depth. Thus, in this case we reach the maximally allowed width 
of the picture at a relatively low level depth from the root, and again, the 
total number of journals appearing in the visualisation is relatively low, in 
a similar fashion to[S9j Based on the above, our choice of uo — 0.13 used in 
the main paper is corresponding to an optimal choice, allowing a relatively 
large number of journals in the visualisation of the top of the hierarchy. 
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S6 Jaccard similarity 


In the main paper we are comparing the flow- and the nested hierarchies 
based on the Jaccard similarity between the sets of aggregated journals from 
the root down to a certain level L However, since the number of levels in the 
flow- and the nested hierarchy are different, we need to introduce actually a 
separate similarity measure for each hierarchy, as given in Eqs.(2-3) in the 
main paper. In Fig.4. in the main paper we displayed Jj(ty) for the flow 
hierarchy, here in Fig ]S10| we show the corresponding J n (£ n ) for the nested 
hierarchy. The behaviour is very similar to that of the Jj(ty) measure. 


Number of levels 

1 2 3 4 5 10 20 



Figure S10: The Jaccard similarity J n (^n) defined in Eq.(3) in the main 
paper, as a function of the level depth £ n in the nested hierarchy. 


ST Comparing the hierarchies by the Kendall-tau 
distance 

The studied flow- and the nested hierarchies can be compared also according 
to partial order distance measures. The basic idea is to first map the given 
hierarchy onto a partial order, given by a domain of candidates C and a 
relation k obeying the following conditions: 

• k is irreflexive, i.e., Mx G C x x, 

• k is asymmetric, i.e., x y => y x 

• hi is transitive, i.e., x -< K y A y -< K z => x -< K z. 
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The intuitive interpretation of the relation x -< K y is that x is ranked before 
y, or x is preferred over y. A pair of candidates are unrelated (incomparable) 
if (x -fi K y ) A (y x). Naturally, the journals are corresponding to the 
candidates, and a given journal is ranked before all of its descendants in 
the hierarchy. However, in case of the flow hierarchy only the levels of the 
hierarchy are given, the ancestor-descendant relations between the journals 
are not specified. Thus, the flow hierarchy is actually corresponding to a 
bucket order, where the “buckets” are given by the hierarchy levels, and we 
assume that a journal in a given bucket is preceding all journals in lower 
buckets, and journals in the same bucket are all equal to each other. 

The Kendall-tau distance measure was originally defined for total orders, 
where all pairs of candidates are comparable. In this case the distance 
measure is corresponding to the number of inversions needed to convert one 
total order to the other one. It can be normalised by dividing by the total 
number of relations, resulting in a value between 0 and 1. In contrast to 
similarity measures, for an identical pair of total orders, the Kendall-tau 
distance is 0, while for maximally different total orders it is 1. Here we 
adapt this concept to the problem of comparing a bucket order (the flow 
hierarchy) and a partial order (the nested hierarchy). 

The basic idea is to iterate over all possible pairs of journals and compare 
their ordering in the bucket order and in the partial order. Whenever we 
observe a mismatch between the two ordering, we increase the distance score 
D by one. The detailed rules for updating D are the following: 

• D is left unchanged if 

— x -< y according to the both the bucket order and the partial 
order, 

— x and y are unrelated according to the partial order, (are on 
different branches) 

• D is increased by one in all other cases, that is 

— if x ^ y according to the bucket order and y -< x according to 
the partial order, 

— if x = y according to the bucket order, while x -< y or y -< x 
according to the partial order. 

In order to normalise D we have to divide the obtained result by the maxi¬ 
mal number of possible mismatches, which is given by the total number of 
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comparable pairs in the partial order. (I.e., if the given pair is unrelated, D 
is left unchanged irrespectively to the ordering in the bucket order). 

By applying the above comparison method, our result for the normalised 
Kendall-tau distance between the flow hierarchy and the nested hierarchy 
is D — 0.1594. For comparison we also calculated the mean distance be¬ 
tween randomised hierarchies. The randomisation was carried out by simply 
swapping pairs of journals in a given hierarchy at random, by keeping the 
structure of the hierarchy (number of levels, number of nodes in a level, 
etc.) fixed. The average distance and standard deviation was given by 
(Z^rand) = 0.8021 zb 0.0169. Thus, the examined two hierarchies are signifi¬ 
cantly closer to each other than expected at random, i.e., the z-score for the 
distance is —38.03626. 


S8 Comparing the hierarchies with other impact 
measures 

We also compare the hierarchies we constructed with more traditional im¬ 
pact measures. The flow hierarchy is defined based on the m-reach of the 
journals given in Eq.(l) in the main paper. This quantity can be directly 
compared to any traditional impact measure in a simple fashion. Along this 
line, in Figs |Sll||S12[ we show the 2012 journal impact factor obtained from 
Thomson Reuters (2015) as a function of the m-reach. The scatter plot 
suggests moderate correlations, which is supported by the Cp ear son = 0.498 
Pearson’s correlation coefficient and Cs pea rman = 0.646 Spearman’s rank cor¬ 
relation coefficient. A few outlier journals are identified in Fig |Sll| e.g., CA: 
A Cancer Journal for Clinicians has a rather large impact factor, accompa¬ 
nied by a relatively low reach, whereas in contrast PNAS has considerably 
lower impact factor and a quite large reach. 

In Figs S13{|S14 we show the latest Scimago Journal Rank (The Scimago 
Journal & Country Rank) as a function of the m-reach. Similarly to the im¬ 


pact factor, the scatter plot is revealing a moderate correlation between the 
two quantities, in consistency with the Cp ear son = 0.460 Pearson’s correlation 
coefficient and Cs pea rman = 0.584 Spearman’s rank correlation coefficient ob¬ 
tained for the shown data. 

Finally, in Fig |S15| we show the closeness of the journals calculated in the 
aggregated citation network as a function of the m-reach. Similarly to the 
previous impact measures, we can observe a moderate correlation between 
the two quantities. The corresponding Pearson’s correlation coefficient is 
given by Cp earS on — 0.466, and the Spearman’s rank correlation coefficient 
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Figure Sll: The 2012 journal impact factor as a function of the m-reach, 
given in Eq.(l) in the main paper for the journals in our data base. 
The m-reach was calculated at m = 3 and was normalised by the largest 
m-reach value in the sample. Journals with the highest impact factor and m- 
reach are highlighted with green and blue circles, respectively. In addition, 
two more journals appearing high in both the flow- and the nested hierarchy 
are highlighted in purple and grey. 
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Figure S12: The same scatter plot as in FigjSTTJ on logarithmic scale. The 
2012 journal impact factor is plotted as a function of the m-reach of the 
journals, normalised by the largest m-reach value in the sample. 

is equal to Cgp earman = 0.823. 

In addition, we may also apply the concept of the Kendall-tau distance 
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Figure S13: The Scimago Journal Rank as a function of the m-reach, C m (J) 
for the journals in our data base. The m-reach was calculated at m = 3 and 
was normalised by the largest m-reach value in the sample. Journals with 
the highest Scimago Journal Rank and m-reach are highlighted with green 
and blue circles, respectively. In addition, two more journals appearing high 
in both the flow- and the nested hierarchy are highlighted in purple. 

50 


40 


x 

(L> 

"D 


.E 30 


o 

fajO 

ro 

E 

'u 

Ul 


20 


10 


• . ; . ■ 


io- 


10 - 


10 


5 10^ 4 10“ 3 
Reach (m = 3) 


io- 


10 


10 ° 


Figure S14: The scatter plot as in Fig |S13[ on logarithmic scale. The 
Scimago Journal Rank is plotted as a function of the m-reach of the journals 
normalised by the largest m-reach value in the sample. 


defined in Sect|S7[ for comparing the impact factor or the Scimago Journal 
Rank with the flow hierarchy. The resulting D = 0.072 for the distance 
between the ranking by the impact factor and the flow hierarchy, and also 
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Figure S15: The closeness value of the journals in the citation network as 
a function of the m- reach, C m {J) for the journals in our data base. The 
m-reach was calculated at m — 3 and was normalised by the largest m-reach 
value in the sample. 
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Figure S16: The 2012 journal impact factor (red circles) and the Scimago 
Journal Ranks (blue crosses) as a function of the level depth in the nested 
hierarchy, where the level depth was normalised by the maximal level depth. 
Note that the scale for the Scimago Journal Ranks is given on the vertical 
axis on the right. 
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the D — 0.086 between the Scimago Journal Rank and the flow hierarchy 
are corresponding to larger distances than the D obtained for the distance 
between the nested hierarchy and the flow hierarchy. However they are also 
significantly lower compared to the distance between randomised rankings, 
providing (Aand) — 0.198 ± 0.003 both for the impact factor the Scimago 
Journal Rank. We note that there are a several ties in both the journal im¬ 
pact factors and the Scimago Journal Ranks, therefore, the structure of the 
corresponding partial orders is not exactly like a linear chain. Accordingly, 
the distance between the randomised flow hierarchy and the randomised 
impact factor rankings can be different from the distance between the ran¬ 
domised flow hierarchy and the randomised Scimago Journal Ranks in the¬ 
ory. 



level / max level (nested) 


Figure S17: The same scatter plot as in Fig |S16[ on logarithmic scale. The 
2012 journal impact factor is shown by red circles, while the Scimago Journal 
Ranks are marked by blue crosses. Both impact measures are plotted as a 
function of the normalised level depth in the nested hierarchy. Note that 
the scale for the Scimago Journal Ranks is given on the vertical axis on the 
right. 


In case of the nested hierarchy in 


we show the journal 


impact factors and the Scimago Journal Ranks as functions of the level 
depth. These scatter plots indicate a moderate negative correlation between 
the traditional impact measures and the level depth as expected: Journals 
with high impact factor and high Scimago Journal Rank tend to be placed 
on the top levels, whereas the impact measures at the lower levels in the 


45 


















hierarchy seem to be lower on average. This is supported by the Cp ear son = 
—0.272 Pearson correlation and the Cs pea rman = —0.211 Spearman’s rank 
correlation coefficient between the impact factor and the nested hierarchy 
level depth, and the Cp earson = —0.289 and Cs pea rman = —0.314 correlation 
coefficients between the Scimago Journal Rank and the nested hierarchy level 
depth. In addition, in Fig. S18 we show the closeness value of the journals 
in the aggregated citation network as a function of their normalised level 
depth in the nested hierarchy. According to the plot, the two quantities show 
moderate correlation, with a corresponding Pearson’s correlation coefficient 
of Cp earS0 n = —0.449. 

Although the magnitude of these correlation coefficients is somewhat 
smaller than in case of the flow hierarchy, the nested hierarchy is still about 
10 times closer to the ranking by the impact factor or by the Scimago Journal 
Ranks according to the Kendall-tau distance: We obtained a D — 0.065 
distance between the nested hierarchy and the ranking according to the 
impact factor, in contrast to the (A-and) = 0.60T0.025 average and standard 
deviation for the distance between two corresponding randomised rankings. 
In case of the Scimago Journal Rank, the Kendall-tau distance is D — 0.057, 
while the average and standard deviation for the randomised rankings is 
(Aand) = 0.460 db 0.015. 



level / max level (nested) 


Figure S18: The closeness value of the journals in the citation network as 
a function of the level depth in the nested hierarchy, where the level depth 
was normalised by the maximal level depth. 
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